************************************************************************* * Myricom MX networking software and documentation * * Copyright (c) 2006 by Myricom, Inc. * * All rights reserved. See the file `COPYING' for copyright notice. * ************************************************************************* README of MX MX, or Myrinet Express, is a low-level communication layer for Myrinet. Table of Contents: I. Directory structure of MX distribution II. Installation 1. Configuring and compiling MX. 2. Installing the MX mcp and driver. a. Load-time Tunable Parameters b. Run-Time Tunable Parameters c. Modifying MX library behavior at Run-time 3. Enabling IP connectivity (OPTIONAL) III. MX Tool/Utility Functions and Test Programs IV. MX Performance V. Caveats a. Write-combining on i386 and x86_64 hosts ============================================= I. Directory Structure of the MX distribution ============================================= mx |-- common |-- doc MX API Documentation |-- driver MX Kernel Drivers | |-- common | |-- freebsd | |-- linux | |-- macosx | |-- solaris | `-- windows |-- libmyriexpress User-level MX API |-- libmyriexpresstcp User-level MX API simulated over tcp |-- mapper-2xp MX Mapper |-- mcp MX Myrinet Control Program (MCP) |-- tests MX Test Suite | `-- interp mx_interp tests `-- unit_test ================ II. Installation ================ MX is supported on the following operating systems and processors: Linux 2.6 and Linux 2.4 for i386, ia64, x86_64 (including AMD64 and EM64T), ppc, ppc64 (including Power4 and Power5). Solaris 10 for SPARC and AMD64. FreeBSD 5.x for i386, AMD64. MacOSX 10.3 and 10.4 for G5 and Intel. MX-2G Supported NICs: PCIXD, PCIXE, PCIXF MX-10G Supported NICs: 10G-PCIE-8A-{C,R,Q} MX-2G can be used only in Myrinet mode, with the NICs connected to a Myrinet-2000 switch. MX-10G can be used in Ethernet mode or Myrinet mode. If the Myri-10G NICs are connected to a Myri-10G switch, MX-10G should use Myrinet mode. If the Myri-10G NICs are connected to a 10-Gigabit Ethernet switch, MX-10G should use Ethernet mode. MX installation is performed in the following three steps. 1. Configuring and compiling MX: -------------------------------- cd $MX_HOME configure make By default, we assume that you would like to use MX in Myrinet mode. If you are using MX-10G, and you would like to use MX in Ethernet mode, you must configure MX-10G with the following option: configure --enable-ether-mode By default, we assume that the header and config files of your Linux kernel (required to compile outside modules and either part a of kernel-headers or kernel-source package depending on your distribution) are pointed by /lib/modules/`uname -r`/{source,build}. If your Linux installation is not standard, or you are cross-compiling for a kernel different from the one of the compile node you must configure with the following option: ./configure --with-linux= where specifies the directory for the linux kernel source. The kernel header files MUST match the running kernel exactly: not only should they both be from the same version, but they should also contain the same kernel configuration options. For 2.6 kernels, the kernel headers/scripts often come in two parts in two different directories, you might need to use both --with-linux and --with-linux-build. For instance to select a specific kernel, you might need something like: ./configure --with-linux=/usr/src/linux-2.6.5-7.151/ \ --with-linux-build=/usr/src/linux-2.6.5-7.151-obj/x86_64/smp/ Additional configure options are available: --enable-32b enable 32-bit library --enable-64b enable 64-bit library --enable-kernel-lib enable kernel library --prefix= install directory (default /opt/mx) --disable-sse2 when using i386 processors without sse2 (like P3, 32-bit-only Athlon and below) --disable-fms use the old mapper implementation 2. Installing the MX mcp and driver: ------------------------------------ Select an installation directory path . It is usually best for to be the path to an NFS directory available on all machines that will share this MX installation. The directory must be accessible using on all machines that will share the installation. must be an absolute path; it must start with "/". However, may contain symbolic links. make install prefix= If you omit prefix=, the mcp and driver will be installed in directory specified with the configure "--prefix" option, or the default directory, /opt/mx/. The MX binaries are located in /bin and /sbin. The 32-bit MX libraries are installed in /lib32 and the 64-bit MX libraries are installed in /lib64. The /lib directory is a symbolic link to either lib64 or lib32 depending on the native wordsize detected by configure. E.g., on most ppc64 distributions, gcc defaults to 32-bit, which means that lib links to lib32. However, on most x86_64 distributions, gcc defaults to 64-bit, so lib links to lib64. Unless specified on the configure line, MX builds 32-bit libraries on 32-bit architectures (i386, ppc) and 64-bit libraries on 64-bit architectures (ia64, AMD64, ppc64, Alpha). It is possible to build both by using the '--enable-32b' and '--enable-64b' configure flags. For Mac OS X, when the Apple Xcode compiler is 64-bit, MX fat libraries are built which are usable by both 32-bit and 64-bit applications, and libraries are always installed in /lib. For Linux, FreeBSD and Solaris, add the MX library directory to the system library search path. Otherwise, individual users will have to either manage their LD_LIBRARY_PATH(_64) environment variable or link their program with an "-rpath/-R" option for the dynamic linker to locate the MX shared library. Next, you must run su root /sbin/mx_local_install on each machine to perform local install steps such as: * Linux: create the devices (/dev/mx* and /dev/mxp*), one device per NIC. install the init script in /etc/init.d if applicable. * FreeBSD: update the devd configuration. install the init script in /etc/init.d if applicable. * MacOSX: install the module in the load directory. * Solaris: update /etc/devlink.tab. install the init script in /etc/init.d if applicable. Last, you must run su root /sbin/mx_start_stop start on each machine to load the modules and, if used in Myrinet mode, this script will start a mapper for each Myrinet NIC contained in the machine. If applicable, the mx_start_stop script is also available in /etc/init.d/mx. Available flags are: - start: unload GM if needed and load MX, the mapper is started automatically. - stop: stop the mapper and unload MX. - start-mapper: start the mapper manually (for Myrinet mode, experts only). - stop-mapper: stop the mapper manually (for Myrinet mode, experts only). - status: indicate if MX is loaded. - restart: stop the mapper, unload MX and reload MX. Note: The MX software provides a separate kernel module for the mcp (firmware) and the driver. If you do not use the mx_start_stop script to load the MX drivers and start the MX mapper, you must ensure that the mcp module is loaded first, as the driver module depends on it. a. Load-time Tunable Parameters: ------------------------------- The MX driver and mcp contain a number of tunable parameters which may be adjusted by the customer when the MX driver is loaded. (Note that we do NOT recommend for the customer to modify driver parameters unless he is confident in what he is doing.) These parameters are set in an OS-dependent manner. On Linux, there are three possibilities: insmod /sbin/mx_driver.o mx_PARAM=VALUE Or, you can pass driver parameters in the MX_MODULE_PARAMS env variable like: env MX_MODULE_PARAMS="mx_PARAM=VALUE" mx_start_stop start Or, you may pass parameters after the action on the command line like: mx_start_stop start mx_PARAM=VALUE ... On Linux 2.6, some parameters may also be changed after loading by writing into /sys/module/mx_driver/parameters/mx_PARAM. On FreeBSD: kenv mx.PARAM=VALUE kldload /sbin/mx.ko On MacOSX: Find old value of boot args: nvram -p | grep boot-args Append desired parameter to boot-args: nvram 'boot-args=$OLD_BOOT_ARGS mx.PARAM=VALUE' Reboot: shutdown -r now On Solaris: Edit /kernel/drv/mx_driver.conf, and set mx_PARAM=VALUE; The current tunable parameters include: (remember that 'mx_PARAM=VALUE' must be replaced with 'mx.PARAM=VALUE' on FreeBSD and MacOSX) * mx_debug_mask: default = 0 This is a bitmap of debug messages to be printed when the driver is configured with --enable-debug. See common/mx_debug.h * mx_max_instance: default = 4 This is the maximum number of Myrinet NICs the driver can support. * mx_max_endpoints: default = 4 This is the maximum number of endpoints per Myrinet NIC. * mx_max_nodes: default = 1024 This is the maximum number of remote nodes supported. * mx_max_send_handles: default = 32 This is the maximum number of simultaneous sends in the NIC. * mx_mapper_path: default = /opt/mx/sbin/mx_start_mapper String specifies the path to a script which is run by the driver to start the mapper. This is used internally by the mx_start_stop script. * mx_ether_rx_frags: default = 0 When enabled, the Linux ethernet driver will use fragmented skbufs to receive big ethernet frames. This is intended to work around problems allocating jumbo frames on machines with little free memory. * mx_small_message_threshold: default = 128 Size (in bytes) below which messages will be written with PIO on the sending side. * mx_medium_message_threshold: default = 32768 Size (in bytes) below which messages will be copied prior to transmission, rather than being DMA'ed directly from their current location. * mx_override_e_to_f: default = 0 When enabled, the driver treats PCIXE cards as if they were PCIXF cards, and disables the second port. This is primarily for testing and development. * mx_security_disabled: default = 0 If security is disabled, unprivileged users may do things like clear the mcp counters, and examine the Lanai sram. * mx_msi: default = 0 When enabled, and when kernel support is present, the Linux driver will use Message Signaled Interrupts, rather than legacy PCI interrupts. * mx_intr_coal_delay: default = 10 This is the maximum delay before raising an interrupt when coalescing (in microseconds). b. Run-time Tunable Parameters: ------------------------------ On Solaris: The per-project memory limits are quite low by default. This value can be queried via: prctl -n project.max-device-locked-memory -i project default When an application needs more memory to be locked, the application may fail with a "pin failure case not implemented" message. To modify this setting, change the "default" line of /etc/project to read: default:3::::project.max-device-locked-memory=(priv,NNN,deny) where NNN is the new value you want for max-device-locked-memory. This setting will take effect at next login. NOTE that you cannot use MB or GB modifiers here, NNN must be a simple integer. c. Modifying MX Library Behavior at Run-time: -------------------------------------------- The MX library behavior and functionality can be modified with run-time environments variables, and/or configure time options. The defaults are intended to work well with all applications, so this documentation is meant for advanced users. These run-time environment variables are categorized as follows: * Registration cache: MX_RCACHE[0|1] (default=0) The MX_RCACHE environment variable is described in the FAQ entry "How do I obtain the maximum bandwidth performance with MPICH-MX? (http://www.myri.com/cgi-bin/fom?file=463). * Communication channels: There are 3 communication channels used by the MX library: * the network channel where messages are going through the NIC and the network * the shared-memory channel where message are exchanged between processes on the same machine by use of shared-memory and special system calls implemented by the MX driver. * the self channel for intra-process message implemented completely internally to the process. The use of these channels is regulated by two environment variables: MX_DISABLE_SHMEM[0|1] (default=0) Setting this variable to 1 will disable the shared-memory channel. Communication for endpoints on the same machine will always go through the network. MX_DISABLE_SELF[0|1] (default=0) Setting this variable to 1 will disable the self-communication channel. In a typical usage you will also disable MX_DISABLE_SHMEM, and even intra-process messages will go through the network. Or, if only MX_DISABLE_SELF is set and not MX_DISABLE_SHMEM, the shared-memory channel implementation will be used for intra-process communications. The MX_DISABLE_SHMEM and MX_DISABLE_SELF should be used consistently among different MX processes within the same job. See also "How is software loopback implemented in MX? (http://www.myri.com/cgi-bin/fom?file=450). * Statistics: MX_STATS=[0|1] (default=0): Setting this variable to 1 will enable the reporting of statistics about various events in the library on a per-endpoint basis. Statistics are displayed when the endpoint is closed. 3. Enabling IP connectivity (OPTIONAL): -------------------------------------- * Linux and FreeBSD: If you wish to enable IP connectivity (Ethernet emulation in Myrinet mode, or Ethernet driver for Ethernet mode), the command is as follows: /sbin/ifconfig myri0 up where you must replace myri0 with the appropriate name (myri1, myri2, etc.) if you have more than one Myrinet NIC per host. * Solaris: If you wish to run IP over Myrinet (ethernet emulation), the command to enable IP over MX is as follows: ifconfig myri0 plumb up where you must replace myri0 with the appropriate name (myri1, myri2, etc.) if you have more than one Myrinet NIC per host. Note for Solaris 10GA: Due to a bug, Solaris 10GA does not support 9000 byte (jumbo) frames as released. In order to obtain full ethernet performance, jumbo frames are critical, so we encourage you to apply Sun's patch 119832-01 for UltraSPARC or patch 119833-01 for AMD64. For patch access, refer to the SUnSolve Patch Access web page at . If you do not apply this patch, you will need to set the Ethernet MTU to 1500 bytes in /kernel/drv/myri.conf by specifying mx_mtu_override=1500 when loading MX. No jumbo frames will be allocated until after the myri0 interface is plumbed. * MacOSX: You should configure the MX ethernet emulation interface as you would any other ethernet interface. On most systems, the MX ethernet adaptor will appear as "en1". It is possible, if you have additional network cards, that the adaptor will appear as "en2", "en3", etc. To verify which ethernet adaptor belongs to MX, you may need to run the "Network Utility" (/Applications/Utilities/Network Utility). Click the "Info" tab and select each "Network Interface" from the menu until you find the one whose Vendor is Myricom, and whose Hardware Address matches the MAC address printed by "mx_info". Once you have found the correct adaptor, configure it via: System Preferences -> Network -> Show -> Ethernet Adaptor (enX) ================================================ III. MX Tool/Utility Functions and Test Programs ================================================ A variety of MX Tool Programs mx_counters mx_dmabench mx_endpoint_info mx_hostname mx_info are available in the /bin/ directory. Test programs are also available in the /bin/tests directory. Refer to the /bin/README for details. ================== IV. MX Performance ================== The MX Benchmark Programs mx_pingpong mx_pingpong_unex mx_stream are located in the /bin/tests/ directory. Directions for running these benchmark programs can be found in the /bin/README. =========== V. Caveats =========== a. Write-combining on i386 and x86_64 hosts For optimal performance of MX on i386 and x86_64 hosts, write-combining must be enabled on the PCI chipset of the host. MX will enable write-combining on ia32 and x86_64 hosts when there are no conflicting attribute regions for physical or PCI memory pre-existing at driver load time. If MX is unable to enable write-combining at load time, an error message like mtrr: type mismatch for fd000000,1000000 old: uncachable new: write-combining will appear in the kernel log. Refer to the Myrinet FAQ entry "When I load MX-2G, I see an error message about "mtrr: type mismatch ... write-combining". What does this error message mean?" (http://www.myri.com/cgi-bin/fom?file=416).